Footage from AJ+,“Hurricane Katrina Relived Through Media Footage

A surprising discovery

“Female-named hurricanes cause significantly more deaths than male-named ones” (Jung et al., 2014)

  • According to the authors:
    1. Female-named Hurricanes \(\rightarrow\) perceived as less risky and intense
    2. Gender stereotypes reduce preparedness (fewer precautions \(\rightarrow\) more deaths)

But then…

Despite theoretical justification and transparent data & code,
something still didn’t add up…

Where did the discrepancy stem from?

  • The original analysis was technically correct and reasonable

BUT

  • It relied on arbitrary analytical choices

  • Only 37 out of >1,700 valid specifications yielded a significant result (Simonsohn et al., 2020)

The problem

Researcher degrees of freedom: Every equally justifiable choice a researcher makes during the research process, which can drastically alter the final results. (Simmons et al., 2011)

The solution

Multiverse Analysis (Steegen et al., 2016)

  • Multiverse of analytical scenarios (data collection, coding and analysis)
  • Analysis and presentation of results from every plausible scenario

    • To assess robustness/fragility of results


Descriptive methods

Inferential methods

But there is still one issue…

  • When we test hundreds or thousands of scenarios…
  • The risk of False Positives skyrockets (exceeding standard \(\alpha = .05\))


\(\Rightarrow\) The Multiple Comparisons Problem


  • Descriptive Multiverse methods offer visualization without correction
    • No framework for Statistical Inference

The proposed solution

PIMA

  • Formal P-value Adjustment for Multiple Comparisons
    • Mathematically corrects for the Garden of forking paths (numerous tests)
  • Strong Control of Type I Error Rate
    • Keeping rate of false positives \(\leq 5\%\)
  • Good Statistical Power
  • Enables Selective Inference
    • Allows us to safely generalize specific significant results to the population

Multiverse Meta-Analysis

Research question: “Is psychotherapy effective for treating depression?”


  • Dataset: 124 primary studies (Randomized Controlled Trials)
Study n yi vi Format Diagnosis Type Control RoB
barrett, 2001 74 0.27 0.07 individual diagnosis cbt-based other ctr low
bolton, 2003 284 1.32 0.02 group diagnosis not-cbt-based cau some concern
allart van dam, 2003 102 0.57 0.04 group subclinical depression cbt-based cau some concern
baumgartner, 2021 455 0.32 0.01 guided self-help cut-off score cbt-based wl low
arjadi, 2018 313 0.39 0.01 guided self-help diagnosis cbt-based other ctr low

Our Meta-Analysis

  • Selection criteria:

    • Therapy: Not-CBT only \(\rightarrow\) CBT has already enough evidence

    • Control: Care As Usual (CAU) only

    • Scope: All formats & diagnoses included

    • Risk of Bias: ‘Low Risk’ only \(\rightarrow\) Avoids inflated effect sizes

  • Statistical Specifications:

    • Model: Random-effects

    • \(\tau\) Estimator: REML (Restricted Maximum Likelihood)

    • Test: Knapp-Hartung adjustment \(\rightarrow\) Robust correction for small k (n. of studies)

    • \(\rho\) (within-study correlation) = 0.5

Results

But…

The Multiverse reality


  • The result is just one specific path among thousands
  • There were >1,000 other equally valid ways to analyze this data
  • This was just a single snapshot of the entire Meta-Analytic Multiverse

Inferential Multiverse Meta-Analysis

  • If we compute the entire multiverse of >1,000 plausible meta-analysis
  • And adjust for the >1,000 tests (Multiple comparisons correction)
  • Result: the original significant finding (\(p = 0.02\)) does not survive

\(\Rightarrow\) Adjusted p-value = 0.07 (Non-significant)

Inferential Multiverse Meta-Analysis

Case Study

PIMMA - Case Study

Dataset

  • RCTs on psychotherapy effectiveness for depression (Plessen et al., 2023)

  • k = 124

  • Population = adults

PIMMA Case Study

Results

Summary effects (k = 1144)



Median = 0.59

\(\boldsymbol{\bar{x}}\) = 0.63

Min-Max = [0.28-1.61]

Clinical Significance \(\geq\) 0.24
(Cuijpers et al., 2014)

P-value Adjustment

Before Correction:

- Non-significant: 8

- Significant: 1136 (99.3%)

After Correction:

Never Significant = 8 (0.7%)

Non-Survivors = 106 (9.3%)

Survivors = 1030 (90%)

Implications

  • Inferential Multiverse Meta-Analysis

    → useful tool to enhance transparency and robustness of evidence

  • Addressing selective reporting and p-hacking
  • Relative stability psychotherapies efficacy for depression

Limitations

  • Simplification of dataset and analyses
  • No multilevel meta-analyses
  • No quantitative assessment of publication bias

Future directions

  • PIMMA to consolidate knowledge and evidence in psychology
  • Extend the method to multilevel and/or multivariate meta-analyses
  • R package on the way!

Data and analyses available
on GitHub and Open Science Framework

manentematteo.github.io/resources

For info: matteo.manente.3@phd.unipd.it

Multiverse Meta-Analysis Takeaways

  • Be thoughtful

    → Include only equally defensible choices (Del Giudice & Gangestad, 2021)

  • Be parsimonious

    → Include only well-justified models for statistical power

  • Be exhaustive

    → Account for all relevant variables to avoid inflated false positive rates